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Abstract. In response to a 1997 problem of M. Vidyasagar, we state a neces- 
sary and sufficient condition for distribution-free PAC learnability of a concept 
class under the family of all non-atomic (diffuse) measures on the domain 
Q. Clearly, finiteness of the classical Vapnik-Chervonenkis dimension of is 
a sufficient, but no longer necessary, condition. Besides, learnability of un- 
der non-atomic measures does not imply the uniform Glivenko-Cantelli property 
with regard to non-atomic measures. Our learnability criterion is stated in terms 
of a combinatorial parameter VCC^modaii) which we call the VC dimension 
of modulo countable sets. The new parameter is obtained by "thickening up" 
single points in the definition of VC dimension to uncountable "clusters". Equiv- 
alently, VCC^mod Wj) < d '\i and only if every countable subclass of has VC 
dimension < d outside a countable subset of Q. The new parameter can be also 
expressed as the classical VC dimension of calculated on a suitable subset of 
a compactification of Q. We do not make any measurability assumptions on , 
assuming instead the validity of Martin's Axiom (MA). 



1 Introduction 

A fundamental result of statistical learning theory says that for a concept class the 
three conditions are equivalent: (1) is distribution-free PAC learnable over the family 
P{Q) of all probability measures on the domain Q, (2) is a uniform Glivenko-Cantelli 
class with regard to P(Q), and (3) the Vapnik-Chervonenkis dimension of '£ is finite 
[VC BEHW|. In this paper we are interested in the problem, discussed by Vidyasagar 
in both editions of his book MV1IV2II as problem 12.8, of giving a similar combinatorial 
description of concept classes ^ which are PAC learnable under the family P„a{Q) of 
all non-atomic probability measures on Q. (A measure pt is non-atomic, or diffuse, if 
every set A of strictly positive measure contains a subset B with < ju(B) < ju(A).) 

The condition VC(^^) < oo, while of course sufficient for to be learnable under 
P„a(0), is not necessary. Let a concept class consist of all finite and all cofinite 
subsets of a standard Borel space Q. Then VC(^) = oo, and moreover ^ is clearly 
not a uniform Glivenko-Cantelli class with regard to non-atomic measures. At the same 
time, ^ is PAC learnable under non-atomic measures: any learning rule X. consistent 
with the subclass {0, £?) will learn ^. Notice that ^ is not consistently learnable under 
non-atomic measures: there are consistent learning rules mapping every training sample 
to a finite set, and they will not learn any cofinite subset of Q. 



The point of this example is that PAC learnability of a concept class '€ under non- 
atomic measures is not affected by adding to 'lo symmetric differences C AN for each 
C e and every countable set A^. 

A version of VC dimension oblivious to this kind of set-theoretic "noise" is obtained 
from the classical definition by "thickening up" individual points and replacing them 
with uncountable clusters (Figure [TJ. 




Define the VC dimension of a concept class ^ modulo countable sets as the supre- 
mum of natural n for which there exists a family of n uncountable sets, A i , A2, . . . , A„ c 
Q, shattered by in the sense that for each J c {1, 2, . . . , n), there is C e which 
contains all sets A,-, / e J, and is disjoint from all sets Aj, j i J. Denote this parameter 
by VCC^ mod (jji ). Clearly, for every concept class 

VCC^modwi) < VCC^). 

In our example above, one has VCC^ mod wi) = 1, even as VCC^) = 00. 
Here is our main result. 

Theorem 1. Let (Q, si) be a standard Bo rel space, and let Q si/ be a concept class. 
Under the Martin' s Axiom (MA), the following are equivalent. 

1. ^ is PAC learnable under the family of all non-atomic measures. 

2. yCC^modwi) = < CO. 

3. Every countable subclass C has finite VC dimension on the complement to 
some countable subset of Q ( which depends on '^'). 

4. There is d such that for every countable C 'g' one has VCC^') < d on the 
complement to some countable subset of £3 (depending on 

5. Every countable subclass is a uniform Glivenko-Cantelli class with regard 
to the family of non-atomic measures. 

6. Same, with sample complexity s(e, 6) which only depends on 'if and not on . 

If^ is universally separable [P], the above are also equivalent to: 

7. VC dimension ofrf is finite outside of a countable subset of Q. 

8. ^ is a uniform Glivenko-Cantelli class with respect to the family of non-atomic 
probability measures. 



Martin's Axiom (MA) [FJ is one of the most often used and best studied additional 
set-theoretic assumptions beyond the standard Zermelo-Frenkel set theory with the Ax- 
iom of Choice (ZFC). In particular, Martin's Axiom follows from the Continuum Hy- 
pothesis (CH), but it is also compatible with the negation of CH, and in fact it is namely 
the combination MA+-1CH that is really interesting. 

The concept class in our initial simple example (which is even image admissible 
Souslin HPI) shows that in general (7) and (8) are not equivalent to the remaining con- 
ditions. Notice that for universally separable classes, ([T]l, (7) and (8) are equivalent 
without additional set-theoretic assumptions. 

The core of the theorem — and the main technical novelty of our paper — is the 
proof of the implication (|3]l=>([T]i. It is based on a special choice of a consistent learning 
rule X having the property that for every concept C € the image of all learning 
samples of the form (cr, C Her) under X forms a uniform Glivenko-Cantelli class. It is 
for establishing this property of £. that we need Martin's Axiom. 

Most of the remaining implications are relavely straightforward adaptations of the 
standard techniques of statistical learning. Nevertheless, (|2l)=>(l3| requires a certain 
technical dexterity, and we study this implication in the setting of Boolean algebras. 

We begin the paper by reviewing a general formal setting, followed by a dicussion 
of Boolean algebras which seem like a natural framework for the problem at hand, espe- 
cially in view of possible generalizations to learning under other intermediate families 
of measures. 

In particular, we will show that our version of the VC dimension modulo count- 
able sets, VC(^modwi), is just the usual VC dimension of the class ^ of concepts 
extended over a suitable compactification of £? and restricted to a certain subdomain of 
the compactification. 

Now the part of Theorem[T]for universally separable concept classes follows easily. 
Afterwards, we discuss Martin's Axiom, prove the existence of a learning rule with the 
above special property, and deduce Theorem[T]for arbitrary concept classes. 

2 The setting 

We need to fix a precise setting, which is mostly standard. The domain (instance space) 
Q - {Q, £/) is a measurable space, that is, a set Q equipped with a sigma-algebra of 
subsets £^ . Typically, Q is assumed to be a standard Bo rel space, that is, a complete sep- 
arable metric space equipped with the sigma-algebra of Borel subsets. We will clarify 
the assumption whenever necessary. 

A concept class is a family, of measurable subsets of U. (Equivalently, can be 
viewed as a family of measurable {0, l)-valued functions on Q.) 

In the learning model, a set P of probability measures on Q is fixed. Usually either 
P = P(£3) is the set of all probability measures (distribution-free learning), oiP - [fi] 
is a single measure (learning under fixed distribution). In our article, the case of interest 
is the family !P - P„a{ii) of all non-atomic measures. 

Every probability measure ponQ defines a distance d^ on ^ as follows: 



d^{A, B)^p(AaB). 



We will not distinguish between a measure ju and its Lebesgue completion, that is, 
an extension of yu over the larger sigma-algebra of Lebesgue measurable subsets of 
Q. Consequently, we will sometimes use the term measurability meaning Lebesgue 
measurability . No confusion can arise here. 

Often it is convenient to approximate the concepts from with elements of the hy- 
pothesis space, which is, technically, a subfamily of jz/ whose closure with regard 
to each (pseudo)metric t/^, fi e P, contains However, in our article we make no 
distinction between ^ and 

A learning sample is a pair s — (cr, t) of finite subsets of Q, where t c cr. It is 
convenient to assume that elements x\,X2, . . . ,x„ e cr are ordered, and thus the set of 
all samples (cr, r) with |cr| - n can be identified with {Q x {0, 1})". A learning rule (for 

is a mapping 

oo 

X: |Ji3" x{0,ir ^ 

;i=I 

which satisfies the following measurability condition: for every C € and e X, the 
function 

Qbo-^ H {Zip-, C n 0-) A C) e R (1) 

is measurable. 

A learning rule £, is consistent (with if for every C e and each cr e W one 

has 

Z{a-, Cncr)ncr = cncr. 

A learning rule £. is probably approximately correct (PAC) under f if for every e > 
sup sup if" {cr eQ": n (X(cr, C n cr) a C) > e} ^ as n ^ oo. (2) 

Here /i®" denotes the (Lebesgue extension of the) product measure on Q". Now the 
origin of the measurability condition ([T]i on the mapping £. is clear: it is implicit in (|2]l. 

Equivalently, there is a function sie, 6) {sample complexity of X) such that for each 
C e ^ and every e 'P an i.i.d. sample cr with > s{€,6) points has the property 
p{C A X(cr, C n cr)) < e with confidence >\ - 6. 

A concept class consisting of measurable sets is PAC learnable under V, if there 
exists a PAC learning rule for under V. A class is consistently learnable (under 
V) if every learning rule consistent with is PAC under V-YIV - P(Q) is the set 
of all probability measures, then is said to be (distribution-free) PAC learnable. At 
the same time, learnability under intermediate families of measures on Q has received 
considerable attention, cf. Chapter 7 in |V2|. 

Notice that in this paper, we only talk of potential PAC learnability, adopting a 
purely information-theoretic viewpoint. 

A closely related concept is that of a uniform Glivenko-Cantelli concept class with 
regard to a family of measures P, that is, a concept class such that for each e > 

sup//*" I sup \p(C) - fi„(C)\ > el ^ as n oo. (3) 



(Cf. Iini, Ch. 3; liMI.) Here fj.„ stands for the empirical (uniform) measure on n points, 
sampled in an i.i.d. fashion from Q according to the distribution fi. One also says that 
has the property of uniform convergence of empirical measures iUCEM property) with 
regard to f LV2i . 

Every uniform Glivenko-Cantelli class (with regard to P) is PAC learnable (under 
V), and in the distribution-free situation, the converse is true as well. Already in the case 
of learning under a single measure, it is not so: a PAC learnable class under a single 
distribution // need not be uniform Glivenko-Cantelli with regard to // (cf. Chapter 6 in 
IIV2II ). Not every PAC learnable class under non-atomic measures is uniform Glivenko- 
Cantelli with regard to non-atomic measures either: the class consisting of all finite and 
all cofinite subsets of i3 is a counter-example. 

We say, following Pollard fP|, that a concept class ^i" consisting of measurable sets 
is universally separable if it contains a countable subfamily with the property that 
every C e is a pointwise limit of a suitable sequence (C„)^j of sets from for 
every x & Q there is with the property that, for all n > A^, x E C„ if jc € C, and x iCn 
If X &C. Such a family is said to be universally dense in 

Probably the main source of uniform Glivenko-Cantelli classes is the finiteness of 
VC dimension. Assume that '^^ satisfies a suitable measurability condition, for instance, 
is image admissible Souslin, or else universally separable. (In particular, a countable 
satisfies either condition.) If VC(^) = c/ < oo, then is uniform Glivenko-Cantelli, 
with a sample complexity bound that does not depend on ^i", but only on e, 5, and d. 
The following is a typical (and far from being optimal) such estimate, which can be 
deduced, for instance, along the lines of llMl : 

s(e, 6,d)<^-^ [dlog log + log ^ j . (4) 

For our purposes, we will fix any such bound and refer to it as a "standard" sample 
complexity estimate for s(e, 6, d). 

A subset N Q Q is universal null if for every non-atomic probability measure jj on 
{Q, £/) one has i-i{N') = for some Borel set A^' containing A^. Universal null Borel sets 
are just countable sets. 

3 VC dimension and Boolean algebras 

Recall that a Boolean algebra, B - (B, A, V, -i, 0, 1), consists of a set, B, equipped with 
two associative and commutative binary operations, A ("meet") and V ("join"), which 
are distributive over each other and satisfy the absorption principles a V {a A b) = a, 
a A (a V b) = a, as well as a unary operation -i (complement), and two elements and 
1, satisfying a V -lO = 1, a A -lO = 0. 

For instance, the family 2^ of all subsets of a set Q, with the union as join, inter- 
section as meet, the empty set as and Q as 1 , as well as the set-theoretic complement 
-lA = A'^, forms a Boolean algebra. In fact, every Boolean algebra can be realized as 
an algebra of subsets of a suitable Q. Even better, according to the Stone representa- 
tion theorem, a Boolean algebra B is isomorphic to the Boolean algebra formed by all 



open-and-closed subsets of a suitable compact space, S (B), called the Stone space of B, 
where the Boolean algebra operations are interpreted set- theoretic ally as above. 

The space S(B) can be obtained in different ways. For instance, one can think 
of elements of S (B) as Boolean algebra homomorphisms from B to the two-element 
Boolean algebra {0, 1) (the algebra of subsets of a singleton). In this way, S{B) is a 
closed topological subspace of the compact zero-dimensional space {0, 1 )^ with the 
usual Tychonoff product topology. 

The Stone space of the Boolean algebra B = 2^ is known as the Stone-Cech com- 
pactification of Q, and is denoted fSQ. The elements of (iQ are ultrafilters on Q. A 
collection ^ of non-empty subsets of Q is an ultrafilter if it is closed under finite inter- 
sections and if for every subset A Q Q either A e ^ or € ^. To every point x e Q 
there corresponds a trivial {principal) ultrafilter, x, consisting of all sets A containing 
X. However, if Q is infinite, the Axiom of Choice assures that there exist non-principal 
ultrafilters on Q. Basic open sets in the space pQ are of the form A - € JSQ: A € (}, 
where A Q Q.lt is interesting to note that each A is at the same time closed, and in fact 
A is the closure of A in fiQ. Moreover, every open and closed subset of fiQ is of the 
form A. 

A one-to-one correspondence between ultrafilters on Q and Boolean algebra homo- 
morphisms 2^ (0,1) is this: think of an ultrafilter ^ on I? as its own indicator function 
on 2^, sending A c Q to 1 if and only if A € ^. It is not difficult to verify that is a 
Boolean algebra homomorphism, and that every homomorphism arises in this way. 
The book [JoJ is a standard reference to the above topics. 

Given a subset of a Boolean algebra B, and a subset X of the Stone space S (B), 
one can regard as a set of binary functions restricted to X, and compute the VC 
dimension of over X. We will denote this parameter VCC^ [ X). 

A subset / of a Boolean algebra B is an ideal if, whenever x,y e I and a e B, 
one has x W y e I and a A x e I. Define a symmetric difference on B by the formula 
X Ay - {x\/ y)\/ -^(x Ay). The quotient Boolean algebra B// consists of all equivalence 
classes modulo the equivalence relation x ~ y x Ay e I.li can be easily verified 

to be a Boolean algebra on its own, with operations induced from B in a unique way. 

The Stone space of Bjl can be identified with a compact topological subspace of 
S (B), consisting of all homomorphisms B — » {0,1} whose kernel contains /. For in- 
stance, if B = 2^ and / is an ideal of subsets of Q, then the Stone space of 2^/1 is easily 
seen to consist of all ultrafilters on Q which do not contain sets from /. 

Theorem 2. Let be a concept class on a domain Q, and let I be an ideal of sets on 
Q. The following conditions are equivalent. 

1. The VC dimension of the (family of closures of the) concept class '£ restricted to 
the Stone space of the quotient algebra 2^^ 1 1 is at least n: VCC^ [ S (2^/1)) > n. 

2. There exists a family Ai,A2, . . . ,A„ of measurable subsets of Q not belonging to I, 
which is shattered by in the sense that if J c {1,2, ...,«}, then there is C e 
which contains all sets A;, i € J, and is disjoint from all sets A/, / i J. 

Proof. ([T]|=>(|2||. Choose ultrafilters , . . . , ^„ in the Stone space of the Boolean algebra 
2^/7, whose collection is shattered by For every J c {1,2, . . .,«}, select Cj e 'if 



which carves the subset : i e J] out of {^i , . . . , ^„}. This means Cj e f if and only if 
; € J. For all / = 1 , 2, set 



Then A, e ^, and hence A, ( I. Furthermore, if / € J, then clearly A,- Q Cj, and if / ^ /, 
then A, n Cy = 0. The sets A, are measurable by their definition. 

(l2]i=>([T]i. Let A], A2, . . . , A„ be a family of subsets of Q not belonging to the set 
ideal / and shattered by in sense of the lemma. For every the family of sets of the 
form A; n B^, Z? € / is a filter and so is contained in some free ultrafilter which is 
clearly disjoint from I and contains A,. If 7 c {1,2, ... ,n] and Cj & ^ contains all 
sets A,, / e J and is disjoint from all sets A,-, i i J, then the closure Cy of Cy in the 
Stone space contains ^, if and only if / € J. We conclude: the collection of ultrafilters 
^i, i - 1,2,..., n, which are all contained in the Stone space of 2^ II, is shattered by the 
closed sets Cy. 

It follows in particular that the VC dimension of a concept class does not change if 
the domain Q is compactified. 

Corollary 1. VCC^ \ Q) = VCC^ \ j3Q). 

Proof. The inequality VC("^ t Q) < VCC^ \ pQ) is trivial. To establish the converse, 
assume there is a subset of fSQ of cardinality n shattered by Choose sets A, as in 
Theorem]!]©. Clearly, any subset of Q meeting each A,- at exactly one point is shattered 
by -r. 

Definition 1. Given a concept class on a domain Q and an ideal I of subsets of Q, 
we define the VC dimension of^ modulo I, 



That is, yC( mod/) > n if and only if any of the equivalent conditions of Theorem^ 
are met. 

Definition 2. Let he a concept class on a domain Q. If I is the ideal of all count- 
able subsets of Q, we denote the VCC^mod/) by VCC^modwi) and call it the VC 
dimension modulo countable sets. 

4 Finiteness of VC dimension modulo countable sets is necessary 
for learnability 

Lemma 1. Every uncountable Borel subset of a standard Bo rel space supports a non- 
atomic Borel probability measure. 

Proof. Let A be an uncountable Borel subset of a standard Borel space Q, that is, Q is 
a Polish space equipped with its Borel structure. According to Souslin's theorem (see 
e.g. Theorem 3.2.1 in lEJ), there exists a Polish (complete separable metric) space X 




(5) 



yCC^mod/) = VC(^ \ 5(2^//)). 



and a continuous one-to-one mapping /: X A. The Polish space X must be therefore 
uncountable, and so supports a diffuse probability measure, v. The direct image measure 
is a Borel probability measure supported on A, and it is diffuse 
because the inverse image of every singleton is a singleton in X and thus has measure 
zero. 



The following result makes no measurability assumptions on the concept class. 

Theorem 3. Let be a concept class on a domain (Q, which is a standard Borel 
space. If^ is PAC learnable under non-atomic measures, then the VC dimension ofrf 
modulo countable sets is finite. 

Proof. This is just a minor variation of a classical result for distribution-free PAC learn- 
ability (Theorem 2.1(i) in IIBEHWI ; we will follow the proof as presented in IIV2I . 
Lemma 7.2 on p. 279). 

Suppose VCCif mod cji) > d. According to Theorem|2l there is a family of uncount- 
able Borel sets A,, / = 1,2,..., d, shattered by in our sense. Using Lemma[Tl select 
for every / - 1,2, ... ,d a non-atomic probability measure fj., supported on A,, and let 
= i 'Z,'!=i IJ-i- This // is a non-atomic Borel probability measure, giving each A,- equal 
weight Ijd. 

For every li-bit string cr there is a concept Co- which contains all A,- with cr, - 1 
and is disjoint from A,- with o", = 0. If A and B take constant values on all the sets 
A,, i = 1, 2, . . . , d, then £/^(A, B) is just the normalized Hamming distance between the 
corresponding c/-bit strings. Now, given A and < k < d, there are 




concepts B with (i^(A, B) < 2e. This allows to get the following lower bound on the 
number of pairwise 2e-separated concepts: 

2' 

Tjk<2ed (^) 

The Chernoff-Okamoto bound allows to estimate the above expression from below by 
exp[2(0.5 - 2e)^d]. We conclude: the metric entropy of with regard to fi is bounded 
below as: 

M{2e, ^,//) > exp[2(0.5 - 2£fd]. 
The assumption VC^^ mod wi) = oo now implies that for every < e < 0.25, 



supM{2e,'^,fj) = oo. 

Pep 



where P denotes the family of all non-atomic measures on Q. By Lemma 7.1 in IIV2I . 
p. 278, the class is not PAC learnable under f. 



5 The universally separable case 



Lemma 2. Let be a universally separable concept class, and let be a universally 
dense countable subset of^. Then 



Proof. For every C 6 there is a sequence (C„) of elements of with the property 
that for each x ^ Q there is such that if n > N and x e C, then x € C„, and if x ^ C, 
then X i C„. Equivalently, for every finite A c Q, there is an so that whenever n > N, 
one has C„ O A = C n A. This means that if A is shattered by it is equally well 
shattered by This established the inequaity VCC^) < VCC^')- while the converse 
inequality is obviously true. 

Theorem 4. For a universally separable concept class the following conditions are 
equivalent. 

1. yCC^modwi) < d. 

2. There exists a countable subset A Q Q such that VCCra I (£3 \ A)) < d. 

Proof. ([TJ=>(|2]i: Choose a countable universally dense subfamily of Let ^ be the 
smallest Boolean algebra of subsets of £3 containing . Denote by A the union of all 
elements of ^ that are countable sets. Clearly, ^ is countable, and so A is a countable 
set. 

Let a finite set B c i3 \ A be shattered by Then, by Lemma |2] it is shattered by 
Select a family of 2'^' sets in shattering B. For every b e B the set 



is uncountable (for it belongs to ^ yet is not contained in A), and the collection of 
sets [b], b e Bis shattered by This estabUshes the inequahty VCC^ |~ {Q \ A)) < 
VCC^modwi). 

(I2]|=>([T): Fix an A c X3 so that VCC^modA^) < d. Suppose a collection of n 
uncountable sets A,, / = 1, 2, . . . , n is shattered by in our sense. The sets A,- \ A are 
non-empty; pick a representative a, e A, \ A, / = 1, 2, . . . , n. The resulting set is 
shattered by meaning n < d. 

Corollary 2. Let 'if be a universally separable concept class on a Borel domain Q. If 
d — yCC^modwi) < oo, then is a universal Glivenko-Cantelli class with regard to 
non-atomic measures and consistently PAC learnable under non-atomic measures. 

Proof. The class has finite VC dimension in the complement to a suitable countable 
subset A of Q, hence is a universal Glivenko-Cantelli class (in the classical sense) in 
the standard Borel space Q\A. But A is a universal null set in Q, hence clearly is 
universal Glivenko-Cantelli with regard to non-atomic measures. 




The class ^€ is distribution-free consistently PAC learnable in the domain Q\A, 
with the standard sample complexity s{e, 6, d). Let Si be any consistent learning rule for 
in Q. The restriction of X to i3 \ A (more exactly, to Vt"^^^ ((Q\A)" x {0, 1)")) is a 
consistent learning rule for restricted to the standard Borel space Q\A, and together 
with the fact that A has measure zero with regard to any non-atomic measure, it implies 
that X is a PAC learning rule for ^€ under non-atomic measures, with the same sample 
complexity function s{_e, 6, d). 



6 Martin's Axiom and learnability 

Martin's Axiom (MA) in one of its equivalent forms says that no compact Hausdorff 
topological space with the countable chain condition is a union of strictly less than con- 
tinuum nowhere dense subsets. Thus, it can be seen as a strengthening of the statement 
of the Baire Category Theorem. In particular, the Continuum Hypothesis (CH) implies 
MA. However, MA is compatible with the negation of CH, and this is where the most 
interesting applications of MA are to be found. We will be using just one particular 
consequence of MA. 

Theorem 5 (Martin-Solovay). Let (Q, /j) be a standard Lebesgue non-atomic proba- 
bility space. Under MA, the Lebesgue measure is 2^° -additive, that is, if k < 2^° and 
Aa, a < K is family of pairwise disjoint measurable sets, then Uq,<^A„ is Lebesgue mea- 
surable and 



In particular, the union of less than continuum null subsets of Q is a null subset. □ 

For the proof and more on MA, see [K], Theorem 2.21, or fF), or ||Jel, pp. 563-565. 

Lemma 3. Let be an infinite concept class on a measurable space Q. Denote k — 
the cardinality of^. There exists a consistent learning rule £,for with the property 
that for every C & ^ and each n, the set 

{i:(cr,Cno-): o-€i3"}c^ (6) 

has cardinality < k. Under MA the rule £. satisfies the measurability condition (Q. 

Proof. Choose a minimal well-ordering of elements of 

= {C„: Of < k], 

and set for every cr e il' and t e {0, 1 )" the value X(o', t) equal to Cp, where 

y6 = min{cK < a:: C] cr - t}, 
provided such a yS exists. Clearly, for each a < k one has 



Z(a,Ca^o-)Q[Cp: I3<a}, 



which assures (|6]l. Besides, the learning rule X is consistent. 

Fix C - Ca & a < K. For every /3 < a define D/j - {cr e Q" : C n cr - C/s n cr}. 
The sets are measurable, and the function 

3 cr fi{j:(C n 0-) A C) e R 

takes a constant value //(C/j A Ca) on each set \ Uy^jjDy, /3 < a. Such sets, as well 
as all their possible unions, are measurable under MA by force of Martin-Solovay's 
Theorem|5] and their union is Q". This implies the condition ([T]l for X,. 

We again recall that a set A c ^? is absolutely null if it is Lebesgue measurable with 
regard to every non-atomic Borel probability measure ^onQ and /i(A) - 0. 

Lemma 4 (Assuming MA). Let ^£ he a class of Borel subsets on a standard Borel 
space Q. Suppose there is a natural d such that every countable subclass C has 
VC dimension < d outside of an absolutely null set (which depends on '^). Then every 
subclass of^ of cardinality < 2^° has the same property. 

Proof. By induction on the cardinality of which we denote a (notice that it never 
exceeds 2*^°, and so the proof only makes sense under the negation of the Continuum 
Hypothesis). Suppose the result is true for all /?, No < < a. Choose a minimally 
well-ordered chain '^y,y < a of subclasses of whose union is For every y, let 
Ny be a universal null subset of Q with the property that "^y has VC dimension < d 
outside of Ny. Martin-Sollovay's Theorem implies that N - Uy<„A/y is absolutely 
null. Consequently, each ^^y has VC dimension < d outside of N, and the same applies 
to the union of the chain. 

Lemma 5 (Assuming MA). Let be a concept class of cardinality k-\£\< 2*^" on a 
standard Borel space Q. If d — VCC^) is finite, then is a uniform Glivenko-Cantelli 
class, with a standard sample complexity estimate s{e, 6, d). 

Proof. A transfinite induction on k. For /c = So the result is classical. Else, represent 
as a union of an increasing transfinite chain of concept classes ^q,, a < k, for each of 
which the statement of Lemma holds. For every e > and n e N, the set 

(cr e a' : sup \h„{(t) - ^i(C)\ < el = H {cr e i3" : sup |;/„(cr) - /i(C)| < el 

is measurable by Martin-Solovay's Theorem|5] Given 6 > Q and n > s(e, 6, d), another 
application of the same result leads to conclude that for every fi E PiQ): 



fi^Ucre Q" : sup \p„(cr) - ^i{C)\ < e} ^ ^i^ 



f^LeQ": sup \p„(ct) - ^^(C)\ < e 



>l-6. 



= inf ju®" \o-eQ": sup \p„(o-) - ^^{C)\ < e 

Ce^ J 



as required. 



The following is an immediate consequence of two previous lemmas. 

Lemma 6 (Assuming MA). Under the assumptions of Lemma |4] every subclass of 
of cardinality < 2^° is uniform Glivenko-Cantelli with regard to the family of non- 
atomic measures on Q. The sample complexity of this class is the usual sample com- 
plexity s{6, e, d) of concept classes ofVC dimension < d. 

Lemma 7 (Assuming MA). Let '^ta be a concept class consisting of Borel subsets of a 
standard Borel space Q. Assume that for some natural d, every countable subclass of^ 
has VC dimension < d outside of some universal null subset of Q. Then the class is 
PAC learnable under the family of all non-atomic measures on Q, with the usual sample 
complexity s{6, e) of distribution- free PAC learning concept classes of VC dimension 
<d. 

Proof. Using Lemma [3] choose a learning rule £. for ^ with the property in Eq. (|6]l. 
Since the family of all Borel subsets of Q is well-known to have cardinality continuum, 
for every concept C and each n the cardinality of the image - -C{C n cr: cr e fl"} c 
^ is strictly less than 2^°. By Lemma |6] is a uniform Glivenko-Cantelli class with 
regard to non-atomic measures on Q, satisfying the standard sample complexity bound. 
The proof is now concluded in a standard way. 



7 The proof of the main theorem 

O^©: this is Theorem|3] 
(|2]l=>(l3]l: follows from Theorem|4] 

(l3]l=>(|4|i: assume that for every d there is a countable subclass % of with the property 
that the VC dimension of % is > li after removing any countable subset of Q. Clearly, 
the countable class U^j% will have infinite VC dimension outside of every countable 
subset of Q, a contradiction. 

(|4]i=>(|6]i: as a consequence of a classical result of Vapnik and Chervonenkis, every 
countable subclass is universal Glivenko-Cantelli with regard to all probability mea- 
sures supported outside of some countable subset of Q, and a standard bound for the 
sample complexity s{6, e) only depends on d, from which the statement follows. 
©^(IS]!: trivial. 

(|5]l=>(l3]l: modelling the classical argument that the uniform Glivenko-Cantelli property 
implies finite VC dimension, in exactly the same spirit as in the proof of our Theorem 
[3] one shows that the uniform Glivenko-Cantelli property of a concept class with re- 
gard to non-atomic measures implies a finite VC dimension modulo countable sets. But 
for a countable (more generally, universally separable) class this means finite VC 
dimension after a removal of a countable set, cf. Theorem|4] 
©^^([rii: this is Lemma|7] and the only implication requiring Martin's Axiom. 

The equivalence of (7) and (8) in the universally separable case follows from 
Theorem m and Corollary |2l □ 



8 Conclusion 



We have characterized concept classes ^€ that are distribution-free PAC learnable un- 
der the family of all non-atomic probability measures on the domain. The criterion is 
obtained without any measurability conditions on the concept class, but at the expense 
of making a set-theoretic assumption in the form of Martin's Axiom. In fact, assuming 
MA makes things easier, and as this axiom is very natural, perhaps it deserves its small 
corner within the foundations of statistical learning. 

It seems that generalizing the result from concept to function classes, using a version 
of the fat shattering dimension modulo countable sets, will not pose particular technical 
difficulties, and we plan to perform this extension in a full journal version of the paper, 
in order to keep the conference submission short. The Boolean algebras will however 
have to give way to commutative C* -algebras |AJ. 

It would be still interesting to know if the present results hold without Martin's 
Axiom, under the assumption that the concept class ^ is image admissible Souslin 
(|D|, pages 186-187). The difficulty here is selecting a measurable learning rule X with 
the property that the images of all learning samples (cr, C n cr), cr e are uniform 
Glivenko-Cantelli. An obvious route to pursue is the recursion on the Borel rank of '£ , 
but we were unable to follow it through. 

Now, a concept class ^ will be learnable under diffuse measures provided there is 
a hypothesis class ^ which has finite VC dimension and such that every C € differs 
from a suitable H € M' by a null set. If consists of all finite and all cofinite subsets 
of Q, this M' is given by {0, Q]. One may conjecture that '£ is learnable under diffuse 
measures if and only if it admits such a "core" M' having finite VC dimension. Is this 
true? 

Another natural question is: can one characterize concept classes that are uniformly 
Glivenko-Cantelli with regard to all non-atomic measures? Apparently, this task re- 
quires yet another version of shattering dimension, which is strictly intermediate be- 
tween Talagrand's "witness of irregularity" |T| and our VC dimension modulo count- 
able sets. We do not have a viable candidate. 

Finally, our investigation open up a possibility of linking learnability and VC di- 
mension to Boolean algebras and their Stone spaces. This could be a glib exercise in 
generalization for its own sake, or maybe something deeper if one manages to invoke 
model theory and forcing. 
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